Skip to content

fix: reduce API test memory consumption from 8.26GB to 1.57GB (#8263) #9097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

highlyavailable
Copy link

@highlyavailable highlyavailable commented May 22, 2025

Memory Optimization Fix for API Tests - Issue #8263

API tests were consuming excessive memory (>40 GiB virtual, >6 GiB resident), causing slower test execution and reduced developer adoption.

RCA

Memory Profile Findings (Before Optimization)

  • Total Memory Allocation: 8.26 GB
  • Ristretto Cache Dominance: 6.7 GB (81% of total allocation)
    • ristretto.newCmRow: 4.5 GB (54.42%)
    • ristretto.Bloom.Size: 2.2 GB (27.21%)

Root Cause

  1. Default Production Configuration in Tests: Tests were using production-level cache sizes:

    • committed.local_cache.size_bytes: 1 GB (default)
    • committed.sstable.memory.cache_size_bytes: 400 MB (default)
  2. Ristretto Cache Overhead: The ristretto cache allocates 10 million counters regardless of actual cache capacity, creating massive memory overhead for test environments.

  3. Multiple Cache Instances: Each test creates two pyramid filesystem caches:

    • MetaRange FS: ~50% of local cache allocation
    • Range FS: ~50% of local cache allocation
  4. Test Repetition: 50+ controller tests each calling setupHandler() with fresh cache instances.

Results

Memory Allocation Improvement

  • Before: 8.26 GB total allocation
  • After: 1.57 GB total allocation
  • Reduction: 81% memory reduction (6.69 GB saved)

Ristretto Cache Impact

  • Before: 6.7 GB ristretto allocation (81% of total)
  • After: 50 MB ristretto allocation (3.2% of total)
  • Ristretto Reduction: 99.25% reduction in ristretto overhead

Testing

  • Tests continue to pass with identical functionality
  • Memory footprint now suitable for dev environments
  • There shouldn't be any impact to prod configurations (only affects test environment)

Verification

  • Full controller test suite runs successfully
  • Memory usage reduced from 8.26 GB to 1.57 GB
  • No functional regressions detected
  • Production configuration remains unchanged

Fixes #8263

…rse#8263)

Addresses excessive memory usage in API tests that consumed >40GB virtual memory and >6GB resident memory, causing slower test execution.

Root cause: Tests inherited production cache configurations (1GB local cache, 400MB SSTable cache) and ristretto's fixed 10M counter allocation created massive overhead for small test caches.

Changes:

- Set test-specific cache sizes (8MB local, 2MB SSTable) in serve_test.go

- Scale ristretto counters adaptively based on cache capacity in eviction.go

- Maintain full test functionality while reducing memory footprint by 81%

Before: 8.26GB total allocation (6.7GB from ristretto caches)

After: 1.57GB total allocation (50MB from ristretto caches)

Tests: All controller tests pass with identical functionality
@CLAassistant
Copy link

CLAassistant commented May 22, 2025

CLA assistant check
All committers have signed the CLA.

@nopcoder nopcoder self-requested a review June 4, 2025 14:29
@nopcoder nopcoder self-assigned this Jun 4, 2025
@arielshaqed
Copy link
Contributor

Does this fix #8263? 🕺🏽

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving to @nopcoder to review properly. Just preferences about one of the default values - which you should both feel free to ignore.

Overall this is really cool and exciting - looking forward to a faster api.test!!1!

@@ -113,6 +113,11 @@ func setupHandler(t testing.TB) (http.Handler, *dependencies) {
// Add endpoint so that 'IsAdvancedAuth' will be in effect
viper.Set("auth.api.endpoint", config.DefaultListenAddress)

// Set minimal cache sizes for testing to reduce memory consumption
// Default production values are too large for test environments
viper.Set("committed.local_cache.size_bytes", 8*1024*1024) // 8MB instead of 1GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
viper.Set("committed.local_cache.size_bytes", 8*1024*1024) // 8MB instead of 1GB
viper.Set("committed.local_cache.size_bytes", 24*1024*1024)
  1. Remove the comment - keeping it means we need to keep this file sync'ed up with defaults.go, and we won't do that.
  2. We'll anyway probably never hit this in a small test, but if we do - it will slow down the API test.

Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you contribution! I like to merge this change, can you address ariel's comment (removing the comment about the default) and we'll merge it.

@nopcoder nopcoder added area/testing Improvements or additions to tests exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached labels Jun 6, 2025
Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good - thanks, left a comment related to the lint errors.

Comment on lines +33 to +35
numCountersToUse = capacity / (1024 * 10) // ~100 counters per 10KB
if numCountersToUse < 1000 {
numCountersToUse = 1000 // Minimum threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set constants for the numbers with meaningful names, our lint rules don't like magic numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Improvements or additions to tests exclude-changelog PR description should not be included in next release changelog minor-change Used for PRs that don't require issue attached
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lakeFS api test runs with >40 GiB of virtual memory, >6 GiB of which is resident
4 participants